{
  "cells": [
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Configure Colab"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### 防止断连<br>Prevent Disconnection"
      ]
    },
    {
      "attachments": {},
      "cell_type": "raw",
      "metadata": {
        "id": "EblwyDLicmnp"
      },
      "source": [
        "按住 Ctrl+Shift 再按下 I 呼出浏览器的开发工具，于控制台内输入以下内容并回车\n",
        "function ConnectButton()\n",
        "{\n",
        "    console.log(\"Connect pushed\"); \n",
        "    document.querySelector(\"#top-toolbar > colab-connect-button\").shadowRoot.querySelector(\"#connect\").click()\n",
        "}\n",
        "setInterval(ConnectButton,60000);"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### 使用GPU<br>Use GPU"
      ]
    },
    {
      "attachments": {},
      "cell_type": "raw",
      "metadata": {},
      "source": [
        "找到上方菜单栏“代码执行程序”——>“更改运行时类型”——>\"硬件加速器\"，选择GPU"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### 克隆仓库<br>Clone Repository"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "#@title Clone Repository\n",
        "!git clone https://github.com/Spr-Aachen/Easy-Voice-Toolkit.git\n",
        "%cd /content/Easy-Voice-Toolkit"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### 安装依赖<br>Install Dependencies"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "#@title Install Dependencies\n",
        "!apt-get update``\n",
        "!apt-get install portaudio19-dev\n",
        "!pip3 install -r requirements.txt\n",
        "#!pip3 install --force-reinstall --yes torch torchvision torchaudio\n",
        "'''\n",
        "!apt-get install python3.9\n",
        "!cp -r /usr/local/lib/python3.10/dist-packages /usr/local/lib/python3.9/\n",
        "'''\n",
        "#exit() # Enable this only when you decide to delete the runtime"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### 装载硬盘<br>Mount Google Drive"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "#@title Mount Google Drive\n",
        "from google.colab import drive\n",
        "drive.mount('/content/drive')"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### 准备文件<br>Prepare Files"
      ]
    },
    {
      "attachments": {},
      "cell_type": "raw",
      "metadata": {},
      "source": [
        "检查是否已将需要处理的文件上传到了 https://drive.google.com/drive/my-drive 中"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Run Tools"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Tool_AudioProcessor\n",
        "该工具会将媒体文件批量转换为音频文件然后自动切除音频的静音部分"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "#@title [Tool] AudioProcessor 该工具会将媒体文件批量转换为音频文件然后自动切除音频的静音部分\n",
        "%cd /content/Easy-Voice-Toolkit\n",
        "\n",
        "from datetime import date\n",
        "from pathlib import Path\n",
        "from EVT_Core.Process.Process import Audio_Processing\n",
        "\n",
        "class Execute_Audio_Processing:\n",
        "    '''\n",
        "    Change media format to WAV and cut off the silent parts\n",
        "    '''\n",
        "    #@markdown **媒体输入目录**：需要输出为音频文件的媒体文件的目录\n",
        "    Media_Dir_Input: str = '/content/drive/MyDrive/%MediaInput%'   #@param {type:\"string\"}\n",
        "    #@markdown **媒体输出格式**：需要输出为的音频文件的格式\n",
        "    Media_Format_Output: str = 'wav'   #@param [\"flac\", \"wav\", \"mp3\", \"aac\", \"ogg\", \"m4a\", \"wma\", \"aiff\", \"au\"]\n",
        "    #@markdown **启用静音切除**：音频中的静音部分将被切除\n",
        "    Slice_Audio: bool = True   #@param {type:\"boolean\"}\n",
        "    #@markdown **均方根阈值 (db)**：低于该阈值的片段将被视作静音进行处理，若有降噪需求可以增加该值\n",
        "    RMS_Threshold: float = -40.   #@param {type:\"number\"}\n",
        "    #@markdown **跳跃大小 (ms)**：每个RMS帧的长度，增加该值能够提高分割精度但会减慢进程\n",
        "    Hop_Size: int = 10   #@param {type:\"integer\"}\n",
        "    #@markdown **最小静音间隔 (ms)**：静音部分被分割成的最小长度，若音频只包含短暂中断可以减小该值（注意：这个值必须小于 Audio Length Min，大于 Hop Size）\n",
        "    Silent_Interval_Min: int = 300   #@param {type:\"integer\"}\n",
        "    #@markdown **最大静音长度 (ms)**：被分割的音频周围保持静音的最大长度（提示：这个值无需完全对应被分割音频中的静音长度。算法将自行检索最佳的分割位置）\n",
        "    Silence_Kept_Max: int = 1000   #@param {type:\"integer\"}\n",
        "    #@markdown **最小音频长度 (ms)**：每个被分割的音频片段所需的最小长度\n",
        "    Audio_Length_Min: int = 3000   #@param {type:\"integer\"}\n",
        "    #@markdown **输出采样率**：输出音频所拥有的采样率，若维持不变则保持'None'即可\n",
        "    SampleRate: int = None   #@param [\"None\", 44100, 48000, 96000, 192000]\n",
        "    #@markdown **输出采样位数**：输出音频所拥有的采样位数，若维持不变则保持'None'即可\n",
        "    SampleWidth: int = None   #@param [\"None\", 8, 16, 24, 32]\n",
        "    #@markdown **合并声道**：将输出音频的声道合并为单声道\n",
        "    ToMono: bool = False   #@param {type:\"boolean\"}\n",
        "    #@markdown **输出目录**：用于保存最后生成的音频文件的目录\n",
        "    Media_Dir_Output: str = f'/content/drive/MyDrive/EVT/音频处理结果/{date.today()}'   #@param {type:\"string\"}\n",
        "\n",
        "AudioConvertandSlice = Audio_Processing(\n",
        "    Execute_Audio_Processing.Media_Dir_Input,\n",
        "    Execute_Audio_Processing.Media_Format_Output,\n",
        "    Execute_Audio_Processing.SampleRate if Execute_Audio_Processing.SampleRate != \"None\" else None,\n",
        "    Execute_Audio_Processing.SampleWidth if Execute_Audio_Processing.SampleWidth != \"None\" else None,\n",
        "    Execute_Audio_Processing.ToMono,\n",
        "    Execute_Audio_Processing.Slice_Audio,\n",
        "    Execute_Audio_Processing.RMS_Threshold,\n",
        "    Execute_Audio_Processing.Audio_Length_Min,\n",
        "    Execute_Audio_Processing.Silent_Interval_Min,\n",
        "    Execute_Audio_Processing.Hop_Size,\n",
        "    Execute_Audio_Processing.Silence_Kept_Max,\n",
        "    Path(Execute_Audio_Processing.Media_Dir_Output).parent.__str__(),\n",
        "    Path(Execute_Audio_Processing.Media_Dir_Output).name\n",
        ")\n",
        "AudioConvertandSlice.Process_Audio()"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Tool_VoiceIdentifier\n",
        "该工具会在不同说话人的音频中批量筛选出属于同一说话人的音频"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "#@title [Tool] VoiceIdentifier 该工具会在不同说话人的音频中批量筛选出属于同一说话人的音频\n",
        "%cd /content/Easy-Voice-Toolkit\n",
        "\n",
        "from datetime import date\n",
        "from pathlib import Path\n",
        "from EVT_Core.ASR.VPR.Identify import Voice_Identifying\n",
        "\n",
        "class Execute_Voice_Identifying:\n",
        "    '''\n",
        "    Contrast the voice and filter out the similar ones\n",
        "    '''\n",
        "    #@markdown **音频输入目录**：需要进行语音识别筛选的音频文件的目录\n",
        "    Audio_Dir_Input: str = '/content/drive/MyDrive/%...%'   #@param {type:\"string\"}\n",
        "    #@markdown **目标人物与音频**：目标人物的名字及其语音文件的所在路径\n",
        "    StdAudioSpeaker: dict = {'%SpeakerName%': '/content/drive/MyDrive/%StdAudio.wav%'}   #@param {type:\"raw\"}\n",
        "    #@markdown **模型加载路径**：用于加载的声纹识别模型的所在路径\n",
        "    Model_Path: str = '/content/drive/MyDrive/%EVT/Model_Download/ASR/VPR/Ecapa-Tdnn_spectrogram.pth%'\n",
        "    #@markdown **判断阈值**：判断是否为同一人的阈值，若参与比对的说话人声音相识度较高可以增加该值\n",
        "    DecisionThreshold: float = 0.75   #@param {type:\"number\"}\n",
        "    #@markdown **模型类型**：声纹识别模型的类型\n",
        "    Model_Type: str = 'Ecapa-Tdnn'   #@param [\"Ecapa-Tdnn\"]\n",
        "    #@markdown **特征提取方法**：音频特征的提取方法\n",
        "    Feature_Method: str = 'spectrogram'   #@param [\"spectrogram\", \"melspectrogram\"]\n",
        "    #@markdown **音频长度**：用于预测的音频长度\n",
        "    Duration_of_Audio: float = 3.00   #@param {type:\"number\"}\n",
        "    #@markdown **输出目录**：用于保存最后生成的结果文件的目录\n",
        "    Output_Dir: str = f'/content/drive/MyDrive/EVT/语音识别结果/{date.today()}'   #@param {type:\"string\"}\n",
        "    #@markdown **识别结果文本名**：用于保存最后生成的记录音频文件与对应说话人的txt文件的名字\n",
        "    AudioSpeakersDataName: str = 'Recgonition'   #@param {type:\"string\"}\n",
        "\n",
        "import os, shutil\n",
        "def ASRResult_Update(AudioSpeakersData_Path: str, MoveToDst: str):\n",
        "    os.makedirs(MoveToDst, exist_ok = True) if Path(MoveToDst).exists() == False else None\n",
        "    with open(AudioSpeakersData_Path, mode = 'w', encoding = 'utf-8') as AudioSpeakersData:\n",
        "        AudioSpeakers = AudioSpeakersData.readlines()\n",
        "        Lines = []\n",
        "        for AudioSpeaker in AudioSpeakers:\n",
        "            Audio, Speaker = AudioSpeaker.split('|', maxsplit = 1)\n",
        "            if Speaker.strip() != '':\n",
        "                MoveToDst_Sub = Path(MoveToDst).joinpath(Speaker).as_posix()\n",
        "                os.makedirs(MoveToDst_Sub, exist_ok = True) if Path(MoveToDst_Sub).exists() == False else None\n",
        "                Audio_Dst = Path(MoveToDst_Sub).joinpath(Path(Audio).name).as_posix()\n",
        "                shutil.copy(Audio, MoveToDst_Sub) if not Path(Audio_Dst).exists() else None\n",
        "                Lines.append(f\"{Audio_Dst}|{Speaker}\\n\")\n",
        "        AudioSpeakersData.writelines(Lines)\n",
        "\n",
        "AudioContrastInference = Voice_Identifying(\n",
        "    Execute_Voice_Identifying.StdAudioSpeaker,\n",
        "    Execute_Voice_Identifying.Audio_Dir_Input,\n",
        "    Execute_Voice_Identifying.Model_Path,\n",
        "    Execute_Voice_Identifying.Model_Type,\n",
        "    Execute_Voice_Identifying.Feature_Method,\n",
        "    Execute_Voice_Identifying.DecisionThreshold,\n",
        "    Execute_Voice_Identifying.Duration_of_Audio,\n",
        "    Path(Execute_Voice_Identifying.Output_Dir).parent.__str__(),\n",
        "    Path(Execute_Voice_Identifying.Output_Dir).name,\n",
        "    Execute_Voice_Identifying.AudioSpeakersDataName\n",
        ")\n",
        "AudioContrastInference.GetModel()\n",
        "AudioContrastInference.Inference()\n",
        "ASRResult_Update(\n",
        "    Path(Execute_Voice_Identifying.Output_Dir).joinpath(Execute_Voice_Identifying.AudioSpeakersDataName) + \".txt\",\n",
        "    Execute_Voice_Identifying.Output_Dir\n",
        ")"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Tool_VoiceTranscriber\n",
        "该工具会将语音文件的内容批量转换为带时间戳的文本并以字幕文件的形式保存"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "#@title [Tool] VoiceTranscriber 该工具会将语音文件的内容批量转换为带时间戳的文本并以字幕文件的形式保存\n",
        "%cd /content/Easy-Voice-Toolkit\n",
        "\n",
        "from datetime import date\n",
        "from pathlib import Path\n",
        "from EVT_Core.STT.Whisper.Transcribe import Voice_Transcribing\n",
        "\n",
        "class Execute_Voice_Transcribing:\n",
        "    '''\n",
        "    Transcribe WAV content to SRT\n",
        "    '''\n",
        "    #@markdown **音频目录**：需要将语音内容转为文字的wav文件的目录\n",
        "    Audio_Dir: str = '/content/drive/MyDrive/%EVT/语音识别结果/...%'   #@param {type:\"string\"}\n",
        "    #@markdown **模型加载路径**：用于加载的Whisper模型的所在路径\n",
        "    Model_Path: str = '/content/drive/MyDrive/%EVT/Model_Download/STT/Whisper/small.pt%'   #@param {type:\"string\"}\n",
        "    #@markdown **标注语言信息**：标注音频中说话人所使用的语言，若用于VITS数据集制作则建议启用\n",
        "    Add_LanguageInfo: str = True   #@param {type:\"boolean\"}\n",
        "    #@markdown **半精度训练**：主要使用半精度浮点数进行计算，若GPU不可用则忽略或禁用此项\n",
        "    fp16: bool = True   #@param {type:\"boolean\"}\n",
        "    #@markdown **启用输出日志**：是否输出debug日志\n",
        "    Verbose: bool = True   #@param {type:\"boolean\"}\n",
        "    #@markdown **关联上下文**：在音频之间的内容具有关联性时启用该项可以获得更好的效果，若模型陷入了失败循环则禁用此项\n",
        "    Condition_on_Previous_Text: bool = False   #@param {type:\"boolean\"}\n",
        "    #@markdown **输出目录**：最后生成的字幕文件将会保存到该目录中\n",
        "    Output_Dir: str = f'/content/drive/MyDrive/EVT/语音转录结果/{date.today()}'   #@param {type:\"string\"}\n",
        "\n",
        "WAVtoSRT = Voice_Transcribing(\n",
        "    Execute_Voice_Transcribing.Model_Path,\n",
        "    Execute_Voice_Transcribing.Audio_Dir,\n",
        "    Execute_Voice_Transcribing.Verbose,\n",
        "    Execute_Voice_Transcribing.Add_LanguageInfo,\n",
        "    Execute_Voice_Transcribing.Condition_on_Previous_Text,\n",
        "    Execute_Voice_Transcribing.fp16,\n",
        "    Path(Execute_Voice_Transcribing.Output_Dir).parent.__str__(),\n",
        "    Path(Execute_Voice_Transcribing.Output_Dir).name\n",
        ")\n",
        "WAVtoSRT.Transcriber()"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Tool_DatasetCreator - VITS2\n",
        "该工具会生成适用于语音模型训练的数据集"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "#@title [Tool] DatasetCreator 该工具会生成适用于语音模型训练的数据集\n",
        "%cd /content/Easy-Voice-Toolkit\n",
        "\n",
        "from datetime import date\n",
        "from pathlib import Path\n",
        "from EVT_Core.Dataset.VITS.Create import Dataset_Creating\n",
        "\n",
        "class Execute_Dataset_Creating:\n",
        "    '''\n",
        "    Convert the whisper-generated SRT and split the WAV\n",
        "    '''\n",
        "    #@markdown **音频文件目录/语音识别结果文件路径**：音频文件的所在目录（要求按说话人分类），或者提供由语音识别得到的文本文件的所在路径\n",
        "    AudioSpeakersData_Path: str = '/content/drive/MyDrive/%EVT/语音识别结果/...%'   #@param {type:\"string\"}\n",
        "    #@markdown **字幕输入目录**：需要转为适用于模型训练的csv文件的srt文件的目录\n",
        "    SRT_Dir: str = '/content/drive/MyDrive/%EVT/语音转录结果/...%'   #@param {type:\"string\"}\n",
        "    #@markdown **添加辅助数据**：添加用以辅助训练的数据集，若当前语音数据的质量/数量较低则建议启用\n",
        "    Add_AuxiliaryData: bool = False   #@param {type:\"boolean\"}\n",
        "    #@markdown **辅助数据文本路径**：辅助数据集的文本的所在路径\n",
        "    AuxiliaryData_Path: str = '/content/drive/MyDrive/%EVT/AuxiliaryData/VITS/AuxiliaryData.txt%'   #@param {type:\"string\"}\n",
        "    #@markdown **添加其它语言辅助数据**：启用以允许添加与当前数据集语言不匹配的辅助数据\n",
        "    Add_UnmatchedLanguage: bool = False   #@param {type:\"boolean\"}\n",
        "    #@markdown **采样率 (HZ)**：数据集所要求的音频采样率，若维持不变则保持'None'即可\n",
        "    SampleRate: int = 22050   #@param [\"None\", 22050, 44100, 48000, 96000, 192000]\n",
        "    #@markdown **采样位数**：数据集所要求的音频采样位数，若维持不变则保持'None'即可\n",
        "    SampleWidth: str = '16'   #@param [\"None\", 8, 16, 24, 32]\n",
        "    #@markdown **合并声道**：将输出音频的声道合并为单声道\n",
        "    ToMono: bool = True   #@param {type:\"boolean\"}\n",
        "    #@markdown **训练集占比**：划分给训练集的数据在数据集中所占的比例\n",
        "    TrainRatio: float = 0.7   #@param {type:\"number\"}\n",
        "    #@markdown **输出目录**：用于保存最后生成的数据集的目录\n",
        "    Output_Dir: str = f'/content/drive/MyDrive/EVT/数据集制作结果/{date.today()}'   #@param {type:\"string\"}\n",
        "    #@markdown **训练集文本名**：用于保存最后生成的训练集txt文件的名字\n",
        "    FileList_Name_Training: str = 'Train'   #@param {type:\"string\"}\n",
        "    #@markdown **验证集文本名**：用于保存最后生成的验证集txt文件的名字\n",
        "    FileList_Name_Validation: str = 'Val'   #@param {type:\"string\"}\n",
        "\n",
        "SRTtoCSVandSplitAudio = Dataset_Creating(\n",
        "    Execute_Dataset_Creating.SRT_Dir,\n",
        "    Execute_Dataset_Creating.AudioSpeakersData_Path,\n",
        "    Execute_Dataset_Creating.SampleRate if Execute_Dataset_Creating.SampleRate != \"None\" else None,\n",
        "    Execute_Dataset_Creating.SampleWidth if Execute_Dataset_Creating.SampleWidth != \"None\" else None,\n",
        "    Execute_Dataset_Creating.ToMono,\n",
        "    Execute_Dataset_Creating.Add_AuxiliaryData,\n",
        "    Execute_Dataset_Creating.AuxiliaryData_Path,\n",
        "    Execute_Dataset_Creating.Add_UnmatchedLanguage,\n",
        "    Execute_Dataset_Creating.TrainRatio,\n",
        "    Path(Execute_Dataset_Creating.Output_Dir).parent.__str__(),\n",
        "    Path(Execute_Dataset_Creating.Output_Dir).name,\n",
        "    Execute_Dataset_Creating.FileList_Name_Training,\n",
        "    Execute_Dataset_Creating.FileList_Name_Validation\n",
        ")\n",
        "SRTtoCSVandSplitAudio.CallingFunctions()"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Tool_VoiceTrainer - VITS2\n",
        "该工具会训练出适用于语音合成的模型文件（若在使用过程中出现报错，可以尝试先`断开连接并删除运行时`，然后重新运行 Configure Colab 部分以及本代码块）"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "#@title [Tool] VoiceTrainer 该工具会训练出适用于语音合成的模型文件\n",
        "%cd /content/Easy-Voice-Toolkit\n",
        "\n",
        "from datetime import date\n",
        "from pathlib import Path\n",
        "from EVT_Core.Train.VITS.Train import Voice_Training\n",
        "\n",
        "class Execute_Voice_Training:\n",
        "    '''\n",
        "    Preprocess and then start training\n",
        "    '''\n",
        "    #@markdown **训练集文本路径**：用于提供训练集音频路径及其语音内容的训练集txt文件的路径\n",
        "    FileList_Path_Training: str = '/content/drive/MyDrive/%EVT/数据集制作结果/Train.txt%'   #@param {type:\"string\"}\n",
        "    #@markdown **验证集文本路径**：用于提供验证集音频路径及其语音内容的验证集txt文件的路径\n",
        "    FileList_Path_Validation: str = '/content/drive/MyDrive/%EVT/数据集制作结果/Val.txt%'   #@param {type:\"string\"}\n",
        "    #@markdown **迭代次数**：将全部样本完整迭代一轮的次数\n",
        "    Epochs: int = 300   #@param {type:\"integer\"}\n",
        "    #@markdown **批处理量**：每轮迭代中单位批次的样本数量（注意：最好设置为2的幂次）\n",
        "    Batch_Size: int = 16   #@param {type:\"integer\"}\n",
        "    #@markdown **使用预训练模型**：使用预训练模型（底模），注意其载入优先级高于检查点\n",
        "    Use_PretrainedModels: bool = True   #@param {type:\"boolean\"}\n",
        "    #@markdown **[可选]预训练G模型路径**：预训练生成器（Generator）模型的路径\n",
        "    Model_Path_Pretrained_G: str = '/content/drive/MyDrive/%EVT/Pretrained Models/standard_G.pth%'   #@param {type:\"string\"}\n",
        "    #@markdown **[可选]预训练D模型路径**：预训练判别器（Discriminator）模型的路径\n",
        "    Model_Path_Pretrained_D: str = '/content/drive/MyDrive/%EVT/Pretrained Models/standard_D.pth%'   #@param {type:\"string\"}\n",
        "    #@markdown **[可选]保留原说话人**：保留底模中原有的说话人，请保证每个原角色至少有一两条音频参与训练\n",
        "    Keep_Original_Speakers: bool = False   #@param {type:\"boolean\"}\n",
        "    #@markdown **[可选]配置加载路径**：用于加载底模人物信息的配置文件的所在路径\n",
        "    Config_Path_Load: str = '/content/drive/MyDrive/%EVT/Pretrained Models/standard_Config.json%'   #@param {type:\"string\"}\n",
        "    #@markdown **进程数量**：进行数据加载时可并行的进程数量\n",
        "    Num_Workers: int = 8   #@param {type:\"integer\"}\n",
        "    #@markdown **半精度训练**：通过混合了float16精度的训练方式减小显存占用以支持更大的批处理量\n",
        "    FP16_Run: bool = True   #@param {type:\"boolean\"}\n",
        "    #@markdown **评估间隔**：每次保存模型所间隔的step数\n",
        "    Eval_Interval: int = 1000   #@param {type:\"integer\"}\n",
        "    #@markdown **输出目录**：用于存放生成的模型和配置文件的目录，若目录中已存在模型则会将其视为检查点（注意：当目录中存在多个模型时，编号最大的会被选为检查点）\n",
        "    Output_Dir: str = f'/content/drive/MyDrive/EVT/模型训练结果/{date.today()}'   #@param {type:\"string\"}\n",
        "\n",
        "# Load the TensorBoard notebook extension\n",
        "%load_ext tensorboard\n",
        "# Start TensorBoard\n",
        "%tensorboard --logdir /content/drive/MyDrive/EVT/TrainResult\n",
        "\n",
        "PreprocessandTrain = Voice_Training(\n",
        "    Execute_Voice_Training.FileList_Path_Training,\n",
        "    Execute_Voice_Training.FileList_Path_Validation,\n",
        "    Execute_Voice_Training.Eval_Interval,\n",
        "    Execute_Voice_Training.Epochs,\n",
        "    Execute_Voice_Training.Batch_Size,\n",
        "    Execute_Voice_Training.FP16_Run,\n",
        "    Execute_Voice_Training.Keep_Original_Speakers,\n",
        "    Execute_Voice_Training.Config_Path_Load,\n",
        "    Execute_Voice_Training.Num_Workers,\n",
        "    Execute_Voice_Training.Use_PretrainedModels,\n",
        "    Execute_Voice_Training.Model_Path_Pretrained_G if Execute_Voice_Training.Model_Path_Pretrained_G != \"None\" else None,\n",
        "    Execute_Voice_Training.Model_Path_Pretrained_D if Execute_Voice_Training.Model_Path_Pretrained_D != \"None\" else None,\n",
        "    Path(Execute_Voice_Training.Output_Dir).parent.__str__(),\n",
        "    Path(Execute_Voice_Training.Output_Dir).name,\n",
        "    \"/content/drive/MyDrive/EVT/log\"\n",
        ")\n",
        "PreprocessandTrain.Preprocessing_and_Training()"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Tool_VoiceConverter - VITS2\n",
        "该工具会将文字转为语音并生成音频文件"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "#@title [Tool] VoiceConverter 该工具会将文字转为语音并生成音频文件\n",
        "%cd /content/Easy-Voice-Toolkit\n",
        "\n",
        "from datetime import date\n",
        "from pathlib import Path\n",
        "from EVT_Core.TTS.VITS.Convert import Voice_Converting\n",
        "\n",
        "class Execute_Voice_Converting:\n",
        "    '''\n",
        "    Convert text to speech and save as audio files\n",
        "    '''\n",
        "    #@markdown **配置加载路径**：该路径对应的配置文件会用于推理\n",
        "    Config_Path_Load: str = '/content/drive/MyDrive/%EVT/模型训练结果/Config.json%'   #@param {type:\"string\"}\n",
        "    #@markdown **G模型加载路径**：用于推理的生成器（Generator）模型所在路径\n",
        "    Model_Path_Load: str = '/content/drive/MyDrive/%EVT/模型训练结果/G_*.pth%'   #@param {type:\"string\"}\n",
        "    #@markdown **输入文字**：输入的文字会作为说话人的语音内容\n",
        "    Text: str = '请输入语句'   #@param {type:\"string\"}\n",
        "    #@markdown **所用语言**：说话人/文字所使用的语言，若使用自动检测则保持'None'即可\n",
        "    Language: str = '[ZH]'   #@param [\"None\", \"[ZH]\", \"[EN]\", \"[JA]\"]\n",
        "    #@markdown **人物名字**：说话人物的名字\n",
        "    Speaker: str = '%Name%'   #@param {type:\"string\"}\n",
        "    #@markdown **情感强度**：情感的变化程度\n",
        "    EmotionStrength: float = .667   #@param {type:\"number\"}\n",
        "    #@markdown **音素音长**：音素的发音长度\n",
        "    PhonemeDuration: float = 0.8   #@param {type:\"number\"}\n",
        "    #@markdown **整体语速**：整体的说话速度\n",
        "    SpeechRate: float = 1.0   #@param {type:\"number\"}\n",
        "    #@markdown **音频保存路径**：用于保存推理得到的音频的路径\n",
        "    Audio_Path_Save: str = f'/content/drive/MyDrive/EVT/语音合成结果/{date.today()}.wav'   #@param {type:\"string\"}\n",
        "\n",
        "VoiceConverting = Voice_Converting(\n",
        "    Execute_Voice_Converting.Config_Path_Load,\n",
        "    Execute_Voice_Converting.Model_Path_Load,\n",
        "    Execute_Voice_Converting.Text,\n",
        "    Execute_Voice_Converting.Language,\n",
        "    Execute_Voice_Converting.Speaker,\n",
        "    Execute_Voice_Converting.EmotionStrength,\n",
        "    Execute_Voice_Converting.PhonemeDuration,\n",
        "    Execute_Voice_Converting.SpeechRate,\n",
        "    Execute_Voice_Converting.Audio_Path_Save\n",
        ")\n",
        "VoiceConverting.Converting()"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "collapsed_sections": [],
      "provenance": []
    },
    "kernelspec": {
      "display_name": "My_Env",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.9.16"
    },
    "vscode": {
      "interpreter": {
        "hash": "d638444b7179bdfc0dc1957817588c7007ff4b9946fa53f9cc2df304cc8f4127"
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
